Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 8693 |
| Missing cells | 200 |
| Missing cells (%) | 0.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.1 MiB |
| Average record size in memory | 128.0 B |
Variable types
| Categorical | 8 |
|---|---|
| Numeric | 8 |
PassengerId has a high cardinality: 8693 distinct values | High cardinality |
Name has a high cardinality: 8473 distinct values | High cardinality |
Cabin is highly overall correlated with HomePlanet and 1 other fields | High correlation |
Cabin_num is highly overall correlated with HomePlanet and 1 other fields | High correlation |
HomePlanet is highly overall correlated with Cabin and 2 other fields | High correlation |
CryoSleep is highly overall correlated with Transported | High correlation |
RoomService is highly overall correlated with CryoSleep | High correlation |
FoodCourt is highly overall correlated with CryoSleep | High correlation |
ShoppingMall is highly overall correlated with CryoSleep | High correlation |
Spa is highly overall correlated with CryoSleep | High correlation |
VRDeck is highly overall correlated with CryoSleep | High correlation |
Destination is highly overall correlated with HomePlanet | High correlation |
Transported is highly overall correlated with CryoSleep | High correlation |
Name has 200 (2.3%) missing values | Missing |
PassengerId is uniformly distributed | Uniform |
Name is uniformly distributed | Uniform |
PassengerId has unique values | Unique |
Cabin has 256 (2.9%) zeros | Zeros |
Age has 178 (2.0%) zeros | Zeros |
RoomService has 5577 (64.2%) zeros | Zeros |
FoodCourt has 5471 (62.9%) zeros | Zeros |
ShoppingMall has 5683 (65.4%) zeros | Zeros |
Spa has 5324 (61.2%) zeros | Zeros |
VRDeck has 5497 (63.2%) zeros | Zeros |
Reproduction
| Analysis started | 2022-12-02 01:19:44.043068 |
|---|---|
| Analysis finished | 2022-12-02 01:20:03.762876 |
| Duration | 19.72 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
| Distinct | 8693 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 68.0 KiB |
| 0001_01 | 1 |
|---|---|
| 7999_01 | 1 |
| 8014_01 | 1 |
| 8012_03 | 1 |
| 8012_02 | 1 |
| Other values (8688) |
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 7 |
| Min length | 7 |
Characters and Unicode
| Total characters | 60851 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 8693 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 0001_01 |
|---|---|
| 2nd row | 0002_01 |
| 3rd row | 0003_01 |
| 4th row | 0003_02 |
| 5th row | 0004_01 |
Common Values
| Value | Count | Frequency (%) |
| 0001_01 | 1 | < 0.1% |
| 7999_01 | 1 | < 0.1% |
| 8014_01 | 1 | < 0.1% |
| 8012_03 | 1 | < 0.1% |
| 8012_02 | 1 | < 0.1% |
| 8012_01 | 1 | < 0.1% |
| 8010_01 | 1 | < 0.1% |
| 8007_02 | 1 | < 0.1% |
| 8007_01 | 1 | < 0.1% |
| 8006_01 | 1 | < 0.1% |
| Other values (8683) | 8683 |
Length
| Value | Count | Frequency (%) |
| 0001_01 | 1 | < 0.1% |
| 0044_01 | 1 | < 0.1% |
| 0003_01 | 1 | < 0.1% |
| 0003_02 | 1 | < 0.1% |
| 0004_01 | 1 | < 0.1% |
| 0005_01 | 1 | < 0.1% |
| 0006_01 | 1 | < 0.1% |
| 0007_01 | 1 | < 0.1% |
| 0008_01 | 1 | < 0.1% |
| 0008_03 | 1 | < 0.1% |
| Other values (8683) | 8683 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 12459 | |
| 1 | 9827 | |
| _ | 8693 | |
| 2 | 5017 | |
| 3 | 4039 | 6.6% |
| 4 | 3790 | 6.2% |
| 6 | 3664 | 6.0% |
| 5 | 3606 | 5.9% |
| 8 | 3557 | 5.8% |
| 7 | 3410 | 5.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 52158 | |
| Connector Punctuation | 8693 | 14.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 12459 | |
| 1 | 9827 | |
| 2 | 5017 | |
| 3 | 4039 | 7.7% |
| 4 | 3790 | 7.3% |
| 6 | 3664 | 7.0% |
| 5 | 3606 | 6.9% |
| 8 | 3557 | 6.8% |
| 7 | 3410 | 6.5% |
| 9 | 2789 | 5.3% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 8693 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 60851 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 12459 | |
| 1 | 9827 | |
| _ | 8693 | |
| 2 | 5017 | |
| 3 | 4039 | 6.6% |
| 4 | 3790 | 6.2% |
| 6 | 3664 | 6.0% |
| 5 | 3606 | 5.9% |
| 8 | 3557 | 5.8% |
| 7 | 3410 | 5.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 60851 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 12459 | |
| 1 | 9827 | |
| _ | 8693 | |
| 2 | 5017 | |
| 3 | 4039 | 6.6% |
| 4 | 3790 | 6.2% |
| 6 | 3664 | 6.0% |
| 5 | 3606 | 5.9% |
| 8 | 3557 | 5.8% |
| 7 | 3410 | 5.6% |
HomePlanet
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 68.0 KiB |
| 0 | |
|---|---|
| 1 | |
| 2 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8693 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 4714 | |
| 1 | 2175 | |
| 2 | 1804 | 20.8% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 4714 | |
| 1 | 2175 | |
| 2 | 1804 | 20.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 4714 | |
| 1 | 2175 | |
| 2 | 1804 | 20.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8693 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 4714 | |
| 1 | 2175 | |
| 2 | 1804 | 20.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8693 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 4714 | |
| 1 | 2175 | |
| 2 | 1804 | 20.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8693 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 4714 | |
| 1 | 2175 | |
| 2 | 1804 | 20.8% |
CryoSleep
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 68.0 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8693 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 5568 | |
| 1 | 3125 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 5568 | |
| 1 | 3125 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 5568 | |
| 1 | 3125 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8693 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 5568 | |
| 1 | 3125 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8693 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 5568 | |
| 1 | 3125 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8693 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 5568 | |
| 1 | 3125 |
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.3010468 |
| Minimum | 0 |
|---|---|
| Maximum | 7 |
| Zeros | 256 |
| Zeros (%) | 2.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 68.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 5 |
| Q3 | 6 |
| 95-th percentile | 6 |
| Maximum | 7 |
| Range | 7 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.7602883 |
|---|---|
| Coefficient of variation (CV) | 0.40926974 |
| Kurtosis | -0.30516724 |
| Mean | 4.3010468 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.94842497 |
| Sum | 37389 |
| Variance | 3.0986149 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5 | 2821 | |
| 6 | 2564 | |
| 4 | 1025 | 11.8% |
| 1 | 779 | 9.0% |
| 2 | 747 | 8.6% |
| 3 | 495 | 5.7% |
| 0 | 256 | 2.9% |
| 7 | 6 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 256 | 2.9% |
| 1 | 779 | 9.0% |
| 2 | 747 | 8.6% |
| 3 | 495 | 5.7% |
| 4 | 1025 | 11.8% |
| 5 | 2821 | |
| 6 | 2564 | |
| 7 | 6 | 0.1% |
| Value | Count | Frequency (%) |
| 7 | 6 | 0.1% |
| 6 | 2564 | |
| 5 | 2821 | |
| 4 | 1025 | 11.8% |
| 3 | 495 | 5.7% |
| 2 | 747 | 8.6% |
| 1 | 779 | 9.0% |
| 0 | 256 | 2.9% |
Cabin_num
Real number (ℝ)
| Distinct | 1817 |
|---|---|
| Distinct (%) | 20.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 591.52905 |
| Minimum | 0 |
|---|---|
| Maximum | 1894 |
| Zeros | 22 |
| Zeros (%) | 0.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 68.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 30 |
| Q1 | 166 |
| median | 407 |
| Q3 | 983 |
| 95-th percentile | 1561 |
| Maximum | 1894 |
| Range | 1894 |
| Interquartile range (IQR) | 817 |
Descriptive statistics
| Standard deviation | 509.49978 |
|---|---|
| Coefficient of variation (CV) | 0.86132673 |
| Kurtosis | -0.66160158 |
| Mean | 591.52905 |
| Median Absolute Deviation (MAD) | 313 |
| Skewness | 0.75153112 |
| Sum | 5142162 |
| Variance | 259590.03 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 289 | 94 | 1.1% |
| 82 | 28 | 0.3% |
| 21 | 27 | 0.3% |
| 90 | 23 | 0.3% |
| 0 | 22 | 0.3% |
| 86 | 22 | 0.3% |
| 19 | 22 | 0.3% |
| 176 | 21 | 0.2% |
| 97 | 21 | 0.2% |
| 56 | 21 | 0.2% |
| Other values (1807) | 8392 |
| Value | Count | Frequency (%) |
| 0 | 22 | |
| 1 | 15 | |
| 2 | 11 | |
| 3 | 16 | |
| 4 | 7 | 0.1% |
| 5 | 13 | |
| 6 | 12 | |
| 7 | 9 | |
| 8 | 13 | |
| 9 | 16 |
| Value | Count | Frequency (%) |
| 1894 | 1 | |
| 1893 | 1 | |
| 1892 | 1 | |
| 1891 | 1 | |
| 1888 | 2 | |
| 1886 | 1 | |
| 1884 | 1 | |
| 1880 | 1 | |
| 1878 | 1 | |
| 1877 | 1 |
Cabin_port
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 68.0 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8693 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 4403 | |
| 1 | 4290 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 4403 | |
| 1 | 4290 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 4403 | |
| 1 | 4290 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8693 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 4403 | |
| 1 | 4290 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8693 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 4403 | |
| 1 | 4290 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8693 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 4403 | |
| 1 | 4290 |
Destination
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 68.0 KiB |
| 2 | |
|---|---|
| 0 | |
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8693 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 2 |
| 3rd row | 2 |
| 4th row | 2 |
| 5th row | 2 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 6082 | |
| 0 | 1815 | 20.9% |
| 1 | 796 | 9.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2 | 6082 | |
| 0 | 1815 | 20.9% |
| 1 | 796 | 9.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 6082 | |
| 0 | 1815 | 20.9% |
| 1 | 796 | 9.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8693 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 6082 | |
| 0 | 1815 | 20.9% |
| 1 | 796 | 9.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8693 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 6082 | |
| 0 | 1815 | 20.9% |
| 1 | 796 | 9.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8693 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 6082 | |
| 0 | 1815 | 20.9% |
| 1 | 796 | 9.2% |
Age
Real number (ℝ)
| Distinct | 80 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.58921 |
| Minimum | 0 |
|---|---|
| Maximum | 79 |
| Zeros | 178 |
| Zeros (%) | 2.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 68.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 19 |
| median | 27 |
| Q3 | 37 |
| 95-th percentile | 55 |
| Maximum | 79 |
| Range | 79 |
| Interquartile range (IQR) | 18 |
Descriptive statistics
| Standard deviation | 14.451078 |
|---|---|
| Coefficient of variation (CV) | 0.50547314 |
| Kurtosis | 0.11527462 |
| Mean | 28.58921 |
| Median Absolute Deviation (MAD) | 9 |
| Skewness | 0.44745822 |
| Sum | 248526 |
| Variance | 208.83365 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 18 | 363 | 4.2% |
| 21 | 328 | 3.8% |
| 24 | 324 | 3.7% |
| 19 | 317 | 3.6% |
| 22 | 294 | 3.4% |
| 23 | 292 | 3.4% |
| 20 | 284 | 3.3% |
| 26 | 268 | 3.1% |
| 28 | 267 | 3.1% |
| 27 | 260 | 3.0% |
| Other values (70) | 5696 |
| Value | Count | Frequency (%) |
| 0 | 178 | |
| 1 | 67 | 0.8% |
| 2 | 75 | |
| 3 | 75 | |
| 4 | 71 | 0.8% |
| 5 | 34 | 0.4% |
| 6 | 40 | 0.5% |
| 7 | 53 | 0.6% |
| 8 | 50 | 0.6% |
| 9 | 44 | 0.5% |
| Value | Count | Frequency (%) |
| 79 | 3 | < 0.1% |
| 78 | 3 | < 0.1% |
| 77 | 2 | < 0.1% |
| 76 | 2 | < 0.1% |
| 75 | 4 | |
| 74 | 5 | |
| 73 | 7 | |
| 72 | 4 | |
| 71 | 7 | |
| 70 | 9 |
VIP
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 68.0 KiB |
| 0 | |
|---|---|
| 1 | 199 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8693 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 8494 | |
| 1 | 199 | 2.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 8494 | |
| 1 | 199 | 2.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 8494 | |
| 1 | 199 | 2.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8693 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 8494 | |
| 1 | 199 | 2.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8693 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 8494 | |
| 1 | 199 | 2.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8693 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 8494 | |
| 1 | 199 | 2.3% |
| Distinct | 1293 |
|---|---|
| Distinct (%) | 14.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 245.71023 |
| Minimum | 0 |
|---|---|
| Maximum | 14327 |
| Zeros | 5577 |
| Zeros (%) | 64.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 68.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 77 |
| 95-th percentile | 1419.4 |
| Maximum | 14327 |
| Range | 14327 |
| Interquartile range (IQR) | 77 |
Descriptive statistics
| Standard deviation | 683.58887 |
|---|---|
| Coefficient of variation (CV) | 2.7820937 |
| Kurtosis | 57.242125 |
| Mean | 245.71023 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.8374018 |
| Sum | 2135959 |
| Variance | 467293.74 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5577 | |
| 1 | 117 | 1.3% |
| 2 | 79 | 0.9% |
| 3 | 61 | 0.7% |
| 4 | 47 | 0.5% |
| 485 | 44 | 0.5% |
| 5 | 28 | 0.3% |
| 9 | 25 | 0.3% |
| 6 | 24 | 0.3% |
| 8 | 24 | 0.3% |
| Other values (1283) | 2667 |
| Value | Count | Frequency (%) |
| 0 | 5577 | |
| 1 | 117 | 1.3% |
| 2 | 79 | 0.9% |
| 3 | 61 | 0.7% |
| 4 | 47 | 0.5% |
| 5 | 28 | 0.3% |
| 6 | 24 | 0.3% |
| 7 | 17 | 0.2% |
| 8 | 24 | 0.3% |
| 9 | 25 | 0.3% |
| Value | Count | Frequency (%) |
| 14327 | 1 | |
| 9920 | 1 | |
| 8586 | 1 | |
| 8243 | 1 | |
| 8209 | 1 | |
| 8168 | 1 | |
| 8151 | 1 | |
| 8142 | 1 | |
| 8030 | 1 | |
| 7406 | 1 |
| Distinct | 1526 |
|---|---|
| Distinct (%) | 17.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 459.71506 |
| Minimum | 0 |
|---|---|
| Maximum | 29813 |
| Zeros | 5471 |
| Zeros (%) | 62.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 68.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 109 |
| 95-th percentile | 2669.4 |
| Maximum | 29813 |
| Range | 29813 |
| Interquartile range (IQR) | 109 |
Descriptive statistics
| Standard deviation | 1595.7785 |
|---|---|
| Coefficient of variation (CV) | 3.471234 |
| Kurtosis | 74.656669 |
| Mean | 459.71506 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.1580172 |
| Sum | 3996303 |
| Variance | 2546509.2 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5471 | |
| 1 | 116 | 1.3% |
| 2 | 75 | 0.9% |
| 3 | 53 | 0.6% |
| 4 | 53 | 0.6% |
| 575 | 39 | 0.4% |
| 5 | 33 | 0.4% |
| 6 | 31 | 0.4% |
| 9 | 28 | 0.3% |
| 7 | 27 | 0.3% |
| Other values (1516) | 2767 |
| Value | Count | Frequency (%) |
| 0 | 5471 | |
| 1 | 116 | 1.3% |
| 2 | 75 | 0.9% |
| 3 | 53 | 0.6% |
| 4 | 53 | 0.6% |
| 5 | 33 | 0.4% |
| 6 | 31 | 0.4% |
| 7 | 27 | 0.3% |
| 8 | 20 | 0.2% |
| 9 | 28 | 0.3% |
| Value | Count | Frequency (%) |
| 29813 | 1 | |
| 27723 | 1 | |
| 27071 | 1 | |
| 26830 | 1 | |
| 21066 | 1 | |
| 18481 | 1 | |
| 17958 | 1 | |
| 17901 | 1 | |
| 17687 | 1 | |
| 17432 | 1 |
| Distinct | 1124 |
|---|---|
| Distinct (%) | 12.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 175.52203 |
| Minimum | 0 |
|---|---|
| Maximum | 23492 |
| Zeros | 5683 |
| Zeros (%) | 65.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 68.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 32 |
| 95-th percentile | 912.4 |
| Maximum | 23492 |
| Range | 23492 |
| Interquartile range (IQR) | 32 |
Descriptive statistics
| Standard deviation | 599.18999 |
|---|---|
| Coefficient of variation (CV) | 3.4137595 |
| Kurtosis | 332.84866 |
| Mean | 175.52203 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 12.663393 |
| Sum | 1525813 |
| Variance | 359028.64 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5683 | |
| 1 | 153 | 1.8% |
| 2 | 80 | 0.9% |
| 3 | 59 | 0.7% |
| 4 | 45 | 0.5% |
| 5 | 38 | 0.4% |
| 7 | 36 | 0.4% |
| 6 | 34 | 0.4% |
| 13 | 29 | 0.3% |
| 8 | 28 | 0.3% |
| Other values (1114) | 2508 |
| Value | Count | Frequency (%) |
| 0 | 5683 | |
| 1 | 153 | 1.8% |
| 2 | 80 | 0.9% |
| 3 | 59 | 0.7% |
| 4 | 45 | 0.5% |
| 5 | 38 | 0.4% |
| 6 | 34 | 0.4% |
| 7 | 36 | 0.4% |
| 8 | 28 | 0.3% |
| 9 | 28 | 0.3% |
| Value | Count | Frequency (%) |
| 23492 | 1 | |
| 12253 | 1 | |
| 10705 | 1 | |
| 10424 | 1 | |
| 9058 | 1 | |
| 7810 | 1 | |
| 7185 | 1 | |
| 7148 | 1 | |
| 7104 | 1 | |
| 6805 | 1 |
| Distinct | 1378 |
|---|---|
| Distinct (%) | 15.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 357.46555 |
| Minimum | 0 |
|---|---|
| Maximum | 22408 |
| Zeros | 5324 |
| Zeros (%) | 61.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 68.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 89 |
| 95-th percentile | 2168.2 |
| Maximum | 22408 |
| Range | 22408 |
| Interquartile range (IQR) | 89 |
Descriptive statistics
| Standard deviation | 1177.323 |
|---|---|
| Coefficient of variation (CV) | 3.2935287 |
| Kurtosis | 68.15581 |
| Mean | 357.46555 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.8335306 |
| Sum | 3107448 |
| Variance | 1386089.5 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5324 | |
| 1 | 146 | 1.7% |
| 2 | 105 | 1.2% |
| 3 | 53 | 0.6% |
| 5 | 53 | 0.6% |
| 4 | 46 | 0.5% |
| 2938 | 36 | 0.4% |
| 7 | 34 | 0.4% |
| 6 | 33 | 0.4% |
| 8 | 28 | 0.3% |
| Other values (1368) | 2835 |
| Value | Count | Frequency (%) |
| 0 | 5324 | |
| 1 | 146 | 1.7% |
| 2 | 105 | 1.2% |
| 3 | 53 | 0.6% |
| 4 | 46 | 0.5% |
| 5 | 53 | 0.6% |
| 6 | 33 | 0.4% |
| 7 | 34 | 0.4% |
| 8 | 28 | 0.3% |
| 9 | 28 | 0.3% |
| Value | Count | Frequency (%) |
| 22408 | 1 | |
| 18572 | 1 | |
| 16594 | 1 | |
| 16139 | 1 | |
| 15586 | 1 | |
| 15331 | 1 | |
| 15238 | 1 | |
| 14970 | 1 | |
| 13995 | 1 | |
| 13902 | 1 |
| Distinct | 1330 |
|---|---|
| Distinct (%) | 15.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 322.14817 |
| Minimum | 0 |
|---|---|
| Maximum | 24133 |
| Zeros | 5497 |
| Zeros (%) | 63.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 68.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 67 |
| 95-th percentile | 1657 |
| Maximum | 24133 |
| Range | 24133 |
| Interquartile range (IQR) | 67 |
Descriptive statistics
| Standard deviation | 1150.9937 |
|---|---|
| Coefficient of variation (CV) | 3.5728705 |
| Kurtosis | 82.193077 |
| Mean | 322.14817 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.5665315 |
| Sum | 2800434 |
| Variance | 1324786.4 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 5497 | |
| 1 | 139 | 1.6% |
| 2 | 70 | 0.8% |
| 3 | 56 | 0.6% |
| 5 | 51 | 0.6% |
| 148 | 48 | 0.6% |
| 4 | 47 | 0.5% |
| 6 | 32 | 0.4% |
| 8 | 30 | 0.3% |
| 7 | 29 | 0.3% |
| Other values (1320) | 2694 |
| Value | Count | Frequency (%) |
| 0 | 5497 | |
| 1 | 139 | 1.6% |
| 2 | 70 | 0.8% |
| 3 | 56 | 0.6% |
| 4 | 47 | 0.5% |
| 5 | 51 | 0.6% |
| 6 | 32 | 0.4% |
| 7 | 29 | 0.3% |
| 8 | 30 | 0.3% |
| 9 | 25 | 0.3% |
| Value | Count | Frequency (%) |
| 24133 | 1 | |
| 20336 | 1 | |
| 17306 | 1 | |
| 17074 | 1 | |
| 16337 | 1 | |
| 14485 | 1 | |
| 12708 | 1 | |
| 12685 | 1 | |
| 12682 | 1 | |
| 12424 | 1 |
| Distinct | 8473 |
|---|---|
| Distinct (%) | 99.8% |
| Missing | 200 |
| Missing (%) | 2.3% |
| Memory size | 68.0 KiB |
| Anton Woody | 2 |
|---|---|
| Cuses Pread | 2 |
| Ankalik Nateansive | 2 |
| Grake Porki | 2 |
| Carry Contrevins | 2 |
| Other values (8468) |
Length
| Max length | 18 |
|---|---|
| Median length | 15 |
| Mean length | 13.833628 |
| Min length | 7 |
Characters and Unicode
| Total characters | 117489 |
|---|---|
| Distinct characters | 53 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
Unique
| Unique | 8453 ? |
|---|---|
| Unique (%) | 99.5% |
Sample
| 1st row | Maham Ofracculy |
|---|---|
| 2nd row | Juanna Vines |
| 3rd row | Altark Susent |
| 4th row | Solam Susent |
| 5th row | Willy Santantines |
Common Values
| Value | Count | Frequency (%) |
| Anton Woody | 2 | < 0.1% |
| Cuses Pread | 2 | < 0.1% |
| Ankalik Nateansive | 2 | < 0.1% |
| Grake Porki | 2 | < 0.1% |
| Carry Contrevins | 2 | < 0.1% |
| Sus Coolez | 2 | < 0.1% |
| Troya Schwardson | 2 | < 0.1% |
| Apix Wala | 2 | < 0.1% |
| Elaney Webstephrey | 2 | < 0.1% |
| Sharie Gallenry | 2 | < 0.1% |
| Other values (8463) | 8473 | |
| (Missing) | 200 | 2.3% |
Length
| Value | Count | Frequency (%) |
| willy | 20 | 0.1% |
| casonston | 18 | 0.1% |
| oneiles | 16 | 0.1% |
| domington | 15 | 0.1% |
| litthews | 15 | 0.1% |
| browlerson | 14 | 0.1% |
| fulloydez | 14 | 0.1% |
| garnes | 14 | 0.1% |
| cartez | 14 | 0.1% |
| idace | 13 | 0.1% |
| Other values (4880) | 16833 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 12691 | 10.8% |
| a | 10251 | 8.7% |
| n | 9155 | 7.8% |
| 8493 | 7.2% | |
| r | 7707 | 6.6% |
| o | 6563 | 5.6% |
| i | 6456 | 5.5% |
| l | 6231 | 5.3% |
| s | 5299 | 4.5% |
| t | 4552 | 3.9% |
| Other values (43) | 40091 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 92010 | |
| Uppercase Letter | 16986 | 14.5% |
| Space Separator | 8493 | 7.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 12691 | |
| a | 10251 | |
| n | 9155 | |
| r | 7707 | |
| o | 6563 | 7.1% |
| i | 6456 | 7.0% |
| l | 6231 | 6.8% |
| s | 5299 | 5.8% |
| t | 4552 | 4.9% |
| y | 4093 | 4.4% |
| Other values (17) | 19012 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1530 | 9.0% |
| C | 1499 | 8.8% |
| B | 1412 | 8.3% |
| M | 1261 | 7.4% |
| A | 1194 | 7.0% |
| P | 987 | 5.8% |
| H | 911 | 5.4% |
| G | 848 | 5.0% |
| D | 809 | 4.8% |
| W | 742 | 4.4% |
| Other values (15) | 5793 |
Space Separator
| Value | Count | Frequency (%) |
| 8493 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 108996 | |
| Common | 8493 | 7.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 12691 | 11.6% |
| a | 10251 | 9.4% |
| n | 9155 | 8.4% |
| r | 7707 | 7.1% |
| o | 6563 | 6.0% |
| i | 6456 | 5.9% |
| l | 6231 | 5.7% |
| s | 5299 | 4.9% |
| t | 4552 | 4.2% |
| y | 4093 | 3.8% |
| Other values (42) | 35998 |
Common
| Value | Count | Frequency (%) |
| 8493 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 117401 | |
| None | 88 | 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 12691 | 10.8% |
| a | 10251 | 8.7% |
| n | 9155 | 7.8% |
| 8493 | 7.2% | |
| r | 7707 | 6.6% |
| o | 6563 | 5.6% |
| i | 6456 | 5.5% |
| l | 6231 | 5.3% |
| s | 5299 | 4.5% |
| t | 4552 | 3.9% |
| Other values (42) | 40003 |
None
| Value | Count | Frequency (%) |
| é | 88 |
Transported
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 68.0 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8693 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 4378 | |
| 0 | 4315 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 4378 | |
| 0 | 4315 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 4378 | |
| 0 | 4315 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8693 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 4378 | |
| 0 | 4315 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8693 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 4378 | |
| 0 | 4315 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8693 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 4378 | |
| 0 | 4315 |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.| PassengerId | HomePlanet | CryoSleep | Cabin | Cabin_num | Cabin_port | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | Spa | VRDeck | Name | Transported | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0001_01 | 1 | 0 | 1 | 0 | 0 | 2 | 39 | 0 | 0 | 0 | 0 | 0 | 0 | Maham Ofracculy | 0 |
| 1 | 0002_01 | 0 | 0 | 5 | 0 | 1 | 2 | 24 | 0 | 109 | 9 | 25 | 549 | 44 | Juanna Vines | 1 |
| 2 | 0003_01 | 1 | 0 | 0 | 0 | 1 | 2 | 58 | 1 | 43 | 3576 | 0 | 6715 | 49 | Altark Susent | 0 |
| 3 | 0003_02 | 1 | 0 | 0 | 0 | 1 | 2 | 33 | 0 | 0 | 1283 | 371 | 3329 | 193 | Solam Susent | 0 |
| 4 | 0004_01 | 0 | 0 | 5 | 1 | 1 | 2 | 16 | 0 | 303 | 70 | 151 | 565 | 2 | Willy Santantines | 1 |
| 5 | 0005_01 | 0 | 0 | 5 | 0 | 0 | 1 | 44 | 0 | 0 | 483 | 0 | 291 | 0 | Sandie Hinetthews | 1 |
| 6 | 0006_01 | 0 | 0 | 5 | 2 | 1 | 2 | 26 | 0 | 42 | 1539 | 3 | 0 | 0 | Billex Jacostaffey | 1 |
| 7 | 0007_01 | 0 | 0 | 5 | 3 | 1 | 2 | 35 | 0 | 0 | 785 | 17 | 216 | 0 | Andona Beston | 1 |
| 8 | 0008_01 | 1 | 1 | 1 | 1 | 0 | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | Erraiam Flatic | 1 |
| 9 | 0008_03 | 1 | 0 | 1 | 1 | 0 | 0 | 45 | 0 | 39 | 7295 | 589 | 110 | 124 | Wezena Flatic | 1 |
| PassengerId | HomePlanet | CryoSleep | Cabin | Cabin_num | Cabin_port | Destination | Age | VIP | RoomService | FoodCourt | ShoppingMall | Spa | VRDeck | Name | Transported | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8683 | 5474_01 | 0 | 1 | 4 | 275 | 0 | 2 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | Thew Strony | 1 |
| 8684 | 0693_01 | 1 | 1 | 4 | 14 | 0 | 0 | 35 | 0 | 0 | 0 | 0 | 2853 | 0 | Mothab Dedometeel | 1 |
| 8685 | 0278_01 | 0 | 0 | 4 | 11 | 0 | 2 | 35 | 0 | 0 | 0 | 0 | 888 | 2648 | Judya Beachez | 0 |
| 8686 | 4637_01 | 2 | 0 | 4 | 221 | 0 | 2 | 24 | 0 | 672 | 0 | 501 | 0 | 34 | Tark Ches | 0 |
| 8687 | 4974_02 | 1 | 1 | 4 | 263 | 0 | 0 | 45 | 0 | 0 | 0 | 0 | 0 | 148 | Lesat Vendeck | 1 |
| 8688 | 8772_02 | 1 | 0 | 3 | 90 | 0 | 0 | 53 | 0 | 0 | 1127 | 0 | 3939 | 400 | Naosura Motled | 0 |
| 8689 | 3821_01 | 0 | 0 | 4 | 309 | 0 | 1 | 35 | 0 | 0 | 2 | 0 | 0 | 867 | Violan Mcphernard | 0 |
| 8690 | 7746_01 | 1 | 1 | 4 | 289 | 0 | 0 | 35 | 0 | 0 | 0 | 0 | 0 | 0 | Antinon Patoetic | 1 |
| 8691 | 4167_01 | 0 | 0 | 4 | 309 | 0 | 1 | 33 | 0 | 0 | 440 | 0 | 0 | 334 | Ninaha Deckerson | 0 |
| 8692 | 2970_01 | 0 | 0 | 5 | 42 | 0 | 2 | 27 | 0 | 740 | 82 | 6 | 628 | 1 | Dwin Adkinson | 0 |